Recently, great progress has been made in single-image super-resolution (SISR) based on deep learning technology. However, the existing methods usually require a large computational cost. Meanwhile, the activation function will cause some features of the intermediate layer to be lost. Therefore, it is a challenge to make the model lightweight while reducing the impact of intermediate feature loss on the reconstruction quality. In this paper, we propose a Feature Interaction Weighted Hybrid Network (FIWHN) to alleviate the above problem. Specifically, FIWHN consists of a series of novel Wide-residual Distillation Interaction Blocks (WDIB) as the backbone, where every third WDIBs form a Feature shuffle Weighted Group (FSWG) by mutual information mixing and fusion. In addition, to mitigate the adverse effects of intermediate feature loss on the reconstruction results, we introduced a well-designed Wide Convolutional Residual Weighting (WCRW) and Wide Identical Residual Weighting (WIRW) units in WDIB, and effectively cross-fused features of different finenesses through a Wide-residual Distillation Connection (WRDC) framework and a Self-Calibrating Fusion (SCF) unit. Finally, to complement the global features lacking in the CNN model, we introduced the Transformer into our model and explored a new way of combining the CNN and Transformer. Extensive quantitative and qualitative experiments on low-level and high-level tasks show that our proposed FIWHN can achieve a good balance between performance and efficiency, and is more conducive to downstream tasks to solve problems in low-pixel scenarios.
translated by 谷歌翻译
Learning with noisy labels is a vital topic for practical deep learning as models should be robust to noisy open-world datasets in the wild. The state-of-the-art noisy label learning approach JoCoR fails when faced with a large ratio of noisy labels. Moreover, selecting small-loss samples can also cause error accumulation as once the noisy samples are mistakenly selected as small-loss samples, they are more likely to be selected again. In this paper, we try to deal with error accumulation in noisy label learning from both model and data perspectives. We introduce mean point ensemble to utilize a more robust loss function and more information from unselected samples to reduce error accumulation from the model perspective. Furthermore, as the flip images have the same semantic meaning as the original images, we select small-loss samples according to the loss values of flip images instead of the original ones to reduce error accumulation from the data perspective. Extensive experiments on CIFAR-10, CIFAR-100, and large-scale Clothing1M show that our method outperforms state-of-the-art noisy label learning methods with different levels of label noise. Our method can also be seamlessly combined with other noisy label learning methods to further improve their performance and generalize well to other tasks. The code is available in https://github.com/zyh-uaiaaaa/MDA-noisy-label-learning.
translated by 谷歌翻译
面部表达识别(FER)是一个具有挑战性的问题,因为表达成分始终与其他无关的因素(例如身份和头部姿势)纠缠在一起。在这项工作中,我们提出了一个身份,并构成了分离的面部表达识别(IPD-fer)模型,以了解更多的判别特征表示。我们认为整体面部表征是身份,姿势和表达的组合。这三个组件用不同的编码器编码。对于身份编码器,在培训期间使用和固定了一个经过良好训练的面部识别模型,这可以减轻对先前工作中对特定表达训练数据的限制,并使野外数据集的分离可行。同时,用相应的标签优化了姿势和表达编码器。结合身份和姿势特征,解码器应生成输入个体的中性面。添加表达功能时,应重建输入图像。通过比较同一个体的合成中性图像和表达图像之间的差异,表达成分与身份和姿势进一步分离。实验结果验证了我们方法对实验室控制和野外数据库的有效性,并实现了最新的识别性能。
translated by 谷歌翻译
由于类间的相似性和注释歧义,嘈杂的标签面部表达识别(FER)比传统的嘈杂标签分类任务更具挑战性。最近的作品主要通过过滤大量损坏样本来解决此问题。在本文中,我们从新功能学习的角度探索了嘈杂的标签。我们发现,FER模型通过专注于可以认为与嘈杂标签相关的一部分来记住嘈杂的样本,而不是从导致潜在真理的整个功能中学习。受到的启发,我们提出了一种新颖的擦除注意力一致性(EAC)方法,以自动抑制嘈杂的样品。具体而言,我们首先利用面部图像的翻转语义一致性来设计不平衡的框架。然后,我们随机删除输入图像,并使用翻转注意一致性,以防止模型专注于部分特征。 EAC明显优于最先进的嘈杂标签方法,并将其概括地概括为其他类似CIFAR100和Tiny-Imagenet等类别的任务。该代码可在https://github.com/zyh-uaiaaaa/erasing-prestention-consistency中获得。
translated by 谷歌翻译
由于数据集的多样性,姿势估计量的概括能力很差。为了解决这个问题,我们通过DH向前运动学模型提出了姿势增强解决方案,我们称之为DH-AUG。我们观察到,先前的工作都是基于单帧姿势增强的,如果将其直接应用于视频姿势估计器,则将存在一些先前忽略的问题:(i)骨旋转的角度歧义(多个溶液); (ii)生成的骨骼视频缺乏运动连续性。为了解决这些问题,我们提出了一个基于DH正向运动学模型的特殊发电机,该模型称为DH生成器。广泛的实验表明,DH-AUG可以大大提高视频姿势估计器的概括能力。另外,当应用于单帧3D姿势估计器时,我们的方法的表现优于先前的最佳姿势增强方法。源代码已在https://github.com/hlz0606/dh-aug-dh-forward-kinematics-model-driven-driven-augmentation-for-3d-human pose-easteration上发布。
translated by 谷歌翻译
基于卷积神经网络的面部伪造检测方法在训练过程中取得了显着的结果,但在测试过程中努力保持可比的性能。我们观察到,检测器比人工制品痕迹更容易专注于内容信息,这表明检测器对数据集的内在偏置敏感,这会导致严重的过度拟合。在这一关键观察的激励下,我们设计了一个易于嵌入的拆卸框架,以删除内容信息,并进一步提出内容一致性约束(C2C)和全球表示对比度约束(GRCC),以增强分解特征的独立性。此外,我们巧妙地构建了两个不平衡的数据集来研究内容偏差的影响。广泛的可视化和实验表明,我们的框架不仅可以忽略内容信息的干扰,而且还可以指导探测器挖掘可疑的人工痕迹并实现竞争性能。
translated by 谷歌翻译
随着GAN的出现,面部伪造技术被严重滥用。即将实现准确的伪造检测。受到PPG信号对应于脸部视频中心跳引起的肤色的周期性变化的启发,我们观察到,尽管在伪造过程中不可避免地损失了PPG信号,但仍然存在PPG信号的混合物,但PPG信号的混合伪造视频具有独特的节奏模式,具体取决于其生成方法。在这一关键观察中,我们提出了一个针对面孔检测和分类的框架,包括:1)用于PPG信号过滤的时空滤波网络(STFNET),以及2)用于约束和约束的时空交互网络(stinet) PPG信号的相互作用。此外,通过深入了解伪造方法的产生,我们进一步提出了源头和源中的材料,以提高框架的性能。总体而言,广泛的实验证明了我们方法的优势。
translated by 谷歌翻译
随着对人脸识别的隐私问题的增加,联邦学习被出现为研究私有分散数据的无约束人脸识别问题的最普遍的方法之一。然而,在面部识别场景中,共享客户之间的整个网络的整个参数的传统分散化联盟算法遭受了隐私泄漏。在这项工作中,我们介绍了一个框架,FEDGC,以解决联合学习,以便进行面部识别,并保证更高的隐私。我们从向后传播的角度探索校正梯度的新颖概念,并提出基于Softmax的常规程序,通过精确注入跨客户端梯度术语来校正基于SoftMax的常规程序来校正类嵌入的梯度。从理论上讲,我们表明FEDGC构成了类似于标准SoftMax的有效损失函数。已经进行了广泛的实验,以验证FEDGC的优越性,该优势可以匹配在几个流行的基准数据集上使用完整培训数据集的传统集中方法的性能。
translated by 谷歌翻译
在本文中,我们通过利用全新监督学习来推进面部表情识别(FER)的表现。本领域技术的当前状态通常旨在通过具有有限数量的样本的培训模型来识别受控环境中的面部表达。为了增强学习模型的各种场景的稳健性,我们建议通过利用标记的样本以及大量未标记的数据来执行全能监督学习。特别是,我们首先使用MS-CeleB-1M作为面部池,其中包括大约5,822k未标记的面部图像。然后,采用基于少量标记样品的原始模型来通过进行基于特征的相似性比较来选择具有高度自信心的样本。我们发现以这种全局监督方式构建的新数据集可以显着提高学习的FER模型的泛化能力,并因此提高了性能。然而,随着使用更多的训练样本,需要更多的计算资源和培训时间,在许多情况下通常不能实惠。为了减轻计算资源的要求,我们进一步采用了数据集蒸馏策略,以将目标任务相关知识从新的开采样本中蒸馏,并将其压缩成一组非常小的图像。这种蒸馏的数据集能够提高FER的性能,额外的额外计算成本。我们在五个流行的基准和新构造的数据集中执行广泛的实验,其中可以使用所提出的框架在各种设置下实现一致的收益。我们希望这项工作作为一个坚实的基线,并帮助缓解FER的未来研究。
translated by 谷歌翻译
Deep learning applies multiple processing layers to learn representations of data with multiple levels of feature extraction. This emerging technique has reshaped the research landscape of face recognition (FR) since 2014, launched by the breakthroughs of DeepFace and DeepID. Since then, deep learning technique, characterized by the hierarchical architecture to stitch together pixels into invariant face representation, has dramatically improved the state-of-the-art performance and fostered successful real-world applications. In this survey, we provide a comprehensive review of the recent developments on deep FR, covering broad topics on algorithm designs, databases, protocols, and application scenes. First, we summarize different network architectures and loss functions proposed in the rapid evolution of the deep FR methods. Second, the related face processing methods are categorized into two classes: "one-to-many augmentation" and "many-to-one normalization". Then, we summarize and compare the commonly used databases for both model training and evaluation. Third, we review miscellaneous scenes in deep FR, such as cross-factor, heterogenous, multiple-media and industrial scenes. Finally, the technical challenges and several promising directions are highlighted.
translated by 谷歌翻译